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Abstract 

In this paper, we present a unifying frame- 
work which reduces the construction of prob- 
abilistic component analysis techniques to a 
mere selection of the latent neighbourhood 
via the prior, thus providing an elegant and 
principled framework for creating novel com- 
ponent analysis models. Under our frame- 
work, we unify many very popular and well- 
studied component analysis algorithms, such 
as Principal Component Analysis (PCA), 
Linear Discriminant Analysis (LDA), Local- 
ity Preserving Projections (LPP) and Slow 
Feature Analysis (SFA) . We firstly show that 
the projection directions produced by all the 
aforementioned methods are also produced 
by the Maximum Likelihood (ML) solution 
of a single joint probability density func- 
tion (PDF) just by choosing an appropriate 
prior over the latent space in our framework. 
Subsequently, we propose novel Expectation 
Maximization (EM) algorithms utilising the 
proposed joint PDF. Theoretical analysis and 
experiments show the usefulness of the pro- 
posed framework. 

1. Introduction 

Unification frameworks in machine learning do not 
only provide valuable material towards the deeper 
understanding of various methodologies, but also 
form a flexible basis upon which further extensions 
can be easily built. One of the first attempts to unify 
methodologies was made in (Roweis & Ghahramani, 
1999). In this seminal work, models such as Factor 



analysis (FA), Principal Component Analysis (PCA), 
mixtures of Gaussian clusters (MGC), vector quanti- 
zation (VQ), Linear Dynamic Systems (LDS), Hidden 
Markov Models (HMM) and Independent Compo- 
nent Analysis (ICA) were unified as variations of 
unsupervised learning under a single basic generative 
model. A more recent work (Takeda et al., 2012), 
unifies algorithms such as support vector machine 
(SVM), minmax probability machine (MPM) and 
Fischer disriminant analysis (FDA) under a robust 
classification (uncertainty minimization) scheme. 

In this paper we propose a framework which unifies 
several well-studied component analysis algorithms. In 
particular, we firstly formulate the joint (complete- 
data) probability density function (PDF) of a set of 
observations and latent variables. Subsequently, we 
show that the Maximum Likelihood (ML) solution of 
this joint PDF can produce the projection directions of 
PCA, LDA, LPP and SFA, by changing only the joint 
prior distribution of the latent variable, which in fact 
models the latent dependencies. For example, when 
using a fully connected Markov Random Field (MRF) 
for the latent prior distribution, we derive the PCA 
method. When choosing the product of a fully con- 
nected MRF and an MRF connected only to within- 
class data, we derive LDA. LPP is derived by choosing 
a locally connected MRF, while finally, SFA is pro- 
duced when the joint prior is a linear Markov-chain. 
Afterwards, based on the aforementioned PDF we pro- 
pose Expectation Maximization (EM) algorithms for 
learning the parameters of the model. Finally, with a 
set of both synthetic and real data, we demonstrate 
the usefulness and advantages of this family of proba- 
bilistic component analysis methods. 
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2. Prior Art and Novelties 

An important contribution of our paper lies in the 
proposed unification of probabilistic component 
techniques. To the best of our knowledge, this is 
the first framework that reduces the construction 
of probabilistic component analysis models to the 
design of a proper prior over the latent space (where 
essentially the choice is reduced to selecting the latent 
neighbourhood) . 

In this section, we review the state-of-the-art in 
probabilistic alternatives of PCA, LDA, LPP and 
SFA. While doing so, we highlight the other novel- 
ties and advantages that our proposed framework en- 
tails wrt. each alternative formulation. Through- 
out this paper we consider, without any loss of gen- 
erality, a zero mean set of i^-dimensional observa- 
tions of size T, {xi, . . . , xt}, represented by a matrix 
X = [xi , . . . , xt] . All of these methods discover an 
A^-dimensional latent space Y = [yi,...,yT] which 
preserves certain properties of X. 

2.1. Probabilistic PCA 

The deterministic model of PCA finds a set of pro- 
jection bases W, with the latent space Y being the 
projection of the training set X (i.e., Y = W^X)). 
The optimization problem is as follows 

Wo = argnraxtr [W^SW] , s.t. W^W = I (1) 

where S = ^i^f the total scatter matrix 

and I the identity matrix. The optimal N projection 
basis Wo are recovered (the N eigenvectors of S that 
correspond to the N largest eigenvalues). 

Probabilistic PCA (PPCA) approaches were inde- 
pendently proposed in (Roweis, 1998) and (Tipping 
& Bishop, 1999). In (Tipping & Bishop, 1999) a 
probabilistic generative model was adopted as: 

X, = Wy, + e„ y, ^ N{0, 1), - N{0, aH) (2) 

where W is the matrix that relates the latent variable 
Yi with the observed samples x^ and is the noise 
which is assumed to be an isotropic Gaussian model. 
The motivation is that, when N < F, the latent 
variables will offer a more parsimonious explanation 
of the dependencies between the observations. The 
ML and EM solutions for parameter and moments 
E[yi] and E[yiy,f] can be found in (Bishop, 2006; 
Tipping & Bishop, 1999). Several variations have 
been proposed since, e.g. by incorporating sparseness 
and nonnegative constraints (Sigg & Buhmann, 2008) 
or utilising joint generative/regression frameworks 
(Yu et al., 2006). 



2.2. Probabilistic LDA 

Let us now further assume that our data X is further 
separated into K disjoint classes Ci , . . . , Ck having Ti 
samples and T — J2c=i l^c|- The Fisher's Linear Dis- 
criminant Analysis (FLDA) finds a set of projection 
bases W s.t. (Yan et al., 2007) 

Wo = argminw tr [W^S^W] , s.t. W^SW = I 

(3) 

where S^, = J2c=i J2^iecM^ ^ /^cJC^* - f^c.V and 
fiQ the mean of class i. The idea is to find a latent 
space Y = W-'^X so that the within-class variance is 
minimized in a whitened space. The solution is given 
by the eigenvectors of S^, that correspond to the N—K 
smallest eigenvectors of the whitened data (i.e. by 
removing the variance after applying PCA). ^ 

Several probabilistic latent variable models which 
exploit class information have been recently proposed 
(c.f., (Prince & Elder, 2007; Zhang & Yeung, 2009; 
loffe, 2006)). In (Prince & Elder, 2007; Zhang & 
Yeung, 2009) another two related attempts were made 
to formulate a PLDA. Considering x^ to be the i-th 
sample of the c-th class, the generative model of 
(Prince & Elder, 2007) can be described as: 

X, = Fhc+Gw,:c+e,c, h^, ^^^c ^ Af{0, 1), e^^ ^ AA(0, S) 

(4) 

where h,, represents the class-specific weights and w^c 
the weights of each individual sample, with G and 
F denoting the corresponding loadings. Regarding 
(Zhang & Yeung, 2009), the probabilistic model is as 
follows: 

X, = F,h, + e,e, h„ F,;, - A/'(0, 1), e,, - J\f{0, S) 

(5) 

We note that the two models become equivalent when 
choosing a common F (Eq. 5) for all classes while 
also disregarding the matrix G. In this case, the ML 
solution is given by obtaining the largest eigenvectors 
of Sw Hence, the solution is vastly different than 
the one obtained by deterministic LDA (which keeps 
the smallest ones, Eq. 3). When learning a different 
Fc per class, the model of (Zhang & Yeung, 2009) 
reduces to applying PPCA per class. 

To the best of our knowledge the only probabilistic 
model where the ML solution is closely related to that 
of deterministic LDA is (loffe, 2006). The probabilistic 
model is defined as follows: x e Ci, x|y ^ Af{y,^w), 
y ~ A/'(m,*6), V^*bV = * and V^^^V I, 

^ We adopt this formulation of LDA instead of the equiv- 
alent of maximizing the trace of the between-class scatter 
matrix (Belhumeur et al., 1997), since this facilitates our 
following discussion on Prob. LDA alternatives. 
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A = V^^, = AA^ * = A^A"^, where the ob- 
servations are generated as; 

X, = Au, u - AA(V, I), V - JV{0, *). (6) 

The drawback of this model is that it requires all 
classes to contain the same number of samples (lofFe, 
2006). As we will show, we overcome this limitation 
in our formulation. 

2.3. Locality Preserving Projections 

Locality Preserving Projections (LPP) is the linear al- 
ternative of Laplacian Eigenmaps (Niyogi, 2004). The 
aim is to obtain a set of projection bases W and a 
latent space Y = W-'^X which preserves the neighbor- 
hoods of the original samples. First, let us define a set 
of weights that represent locality. Common choices for 

the weights are the heat kernel Wij = e '' or a 
set of constant weights {wij = 1 if the i-th and the j- 
th vectors are adjacent and Wij — otherwise). LPP 
finds a set of projection basis matrix W by solving the 
following problem: 

Wo = arg minw TI.j=i J2n=i 1 1^^^, - w^Xj | ^ 

= arg minw tr [W^XLX^W] 
St. W^XDX^W = I 

(7) 

where L — D — W and D — diag(Wl) (where diag(a) 
is the diagonal matrix having as main diagonal vec- 
tor a and 1 is a vector of ones). The objective func- 
tion with the chosen weights Wij results in a heavy 
penalty if the neighboring points x,j and x^ are mapped 
far apart. Therefore, its minimization ensures that 
if Xi and Xj are near, then the projected features 
Yi = W-^Xi and yj = W-^x^ are near, as well. To 
the best of our knowledge no probabilistic models ex- 
ist for LPPs. In the following (Sec. 3, 4), we show how 
a probabilistic version of LPPs arises by choosing an 
appropriate prior over the latent space y^. 

2.4. Probabilistic Slow Feature Analysis 

Now let us consider the case that the columns of x^ are 
samples of a time series of length T. The aim of slow 
feature analysis (SPA) is given T sequential observa- 
tion vectors X = [xi . . .xy], to find an output signal 
representation Y = [yi . . . y^] for which the features 
change slowest over time (Wiskott & Sejnowski, 2002). 
By assuming again a linear mapping Y = W-^X for 
the output representation, SPA minimizes the slowness 
for these values, defined as the variance of the first 
derivative of Y. Formally, W of SPA is computed as 



where X is the first derivative matrix (usually com- 
puted as the first order difference i.e., Xj = Xj — Xj_i). 
An ML solution of the SPA was recently proposed in 
(Turner & Sahani, 2007). The idea was to incorporate 
a Gaussian linear dynamical system prior over the la- 
tent space Y. The proposed generative model is 

P(xt|W,y,,(T,) ^N{W-^yt,alI) 

P{yt\yt-l,Xl:N,Cri:N) = lln=lPiyri,t\yn,t-lAn,(jl) 
P{yn,t\yn,t-1,K, Crl) ^ jV{Xnyn,t~l,Crl) 
P{yn,lWl,l) ^AA(0,<i). 

(9) 

As we will show, SPA is indeed a special case of our 
general model. 

Summarizing, in the following sections we formulate 
a unified, probabilistic framework which (a) incorpo- 
rates PCA as a special case, (b) produces a probabilis- 
tic LDA which (1) does not make assumptions regard- 
ing the number of samples per class (as in (loffc, 2006)) 
and (2) has an ML solution for the loading matrix W 
with similar direction to the deterministic LDA (Eq. 
3), (c) provides the first, to the best of our knowledge, 
probabilistic model that explains LPPs, (d) naturally 
incorporates the recently proposed ML framework of 
SPA (Turner & Sahani, 2007) as a special case while 
also providing an EM optimisation algorithm for SPA, 
and (e) provides variance estimates for each dimension 
along with the observation variance. This also differ- 
entiates our model from existing probabilistic compo- 
nent analysis techniques (PPCA, PLDA, HLDA) by 
providing more robust estimates. 

3. A Maximum Likelihood Framework 
for Unified Component Analysis 

In this section we will present the general framework 
for PCA and show how each of the set of projections 
from PCA, LDA, LPP and SPA can be derived in a 
ML framework. Let us consider the generative model 



X. = W-iy, 



N{0,ail). We will prove 



W„ = arg min tr 
w 



W'XXW 



s.t. SW = I, 



that by choosing one of the priors defined below as a 
latent prior and subsequently taking the ML solution 
wrt. parameters, we end up to the aforementioned 
family of probabilistic component models. The priors, 
parametrised hy (3 — {(Ti-.n, ^i-.n}, are: 

• An MRP with full connectivity - each latent node y^ 
is connected to all other latent nodes yj,j 7^ i. 

P(Y|/3) - 

^ exp |— i J2n=l J2i=l.j = l ^iVn^i — ^nVn,])'^^ 

= I exp {-i (tr [A(i)YY^] + tr"[A(2) YMY^] ) } , 

(10) 



(8) 



where M = -^.ll^. 



Ad) ^ 



,A(2) A 
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2A„ 



• A product of two MRFs. In the first, each latent 
node Yi is connected only to other latent nodes in the 
same class (yj, j £ Ci). In the second, each latent node 
(yi) is connected to all other latent nodes (yj, j 7^ i). 



P(Y|/3) = 

ct2 \yn,i tin,]) f 

|exp"{-i (tr [A(i)YM,Y^] + tr [A(2) YM^Y^] ) } 

(11) 



13) was used in order to provide a derivation of SFA 
in a ML framework. By formulating the proper priors 
for these models we unify these subspace methods in 
a single probabilistic framework of a linear generative 
model along with a prior of the form 

P(Y) 



oc 



exp 



{-h 



A(i)yb(i)Y^ 



tr 



A(2)yb(2)y^ 



z 

exp 



where — I — diag[Ci 



I - M, A(i) ^ 



■ 1 Cp], Cc — 

and A(2) 4 



11c, Mt 

(l-A„)'' 



while Ci = {j : 3 C/ s.t. {xj, x.^} e C;, i 7^ j}. 

• A product of two MRFs. In the first, each latent 
node Yi is connected to all other latent nodes that 
belong in y^'s neighborhood i.e. yj,j E JVi- In the 
second, we only have individual potentials per node. 



P(Y|/3) 



z exp 

(yn,«-ynj)^)exp( - i^^f ^ 

v-T (1-A„)\ ,2 



1 Y^™ 1 Y^-^ 
exp 



iexp{- 



(tr [A(i)YLY^] +tr [A(2)YDY^])} 

(12) 

where L and D are the ones defined in LPPs Section 
2.3. A^^^ and A^^' are defined as above. 

• A linear dynamical system prior over the latent 
space. 



P(Y|/3) = 

|exp{ -X;Li (2^^",! 



r)} 



' 2a^ /^t=2iy^'-,t ^>ni/n,t-lj 

« i"exp{-i (tr [A(i) YKY^]' + tr [A(2)YY^])} 

(13) 

where K = PP"^ and P is a T x (T - 1) matrix with 
elements pu — 1 and P(i+i)i = —1 (the rest are zero). 
The approximation holds when T — >■ 00. Again, A'-"'^-' 
and A*-^' are defined as above. 

In all cases the partition function Z is defined as 
Z = J P{Y)dY. The motivation behind choosing the 
above priors over the latent space was given by the in- 
fluential analysis made in (He et al., 2005) where the 
connection between (the deterministic) LPPs, PCA 
and LDA was explored. A further piece of the puzzle 
was added by the recent work by (Turner & Sahani, 
2007) where the linear dynamical system prior (Eq. 



(14) 

The differentiation amongst these models lies in the 
neighbourhood over which the potentials are defined. 
In fact, the varying neighbouring system is translated 
into different matrices B*^^-' and B^^^ in the functional 
form of the potentials. E.g., for Eq. 10, B^^^ — I and 
B(2) = M, for Eq. 11, B(i) = and B^^) = M^, for 
Eq. 12, B(i) = L and B^^) = D and finally for Eq. 
13, B(i) = K and B^^) = I. 

In the following we will show that ML estimation using 
these potentials results in PCA, LDA and LPP. SFA 
is a special case for which it was already shown in 
(Turner & Sahani, 2007) that a potential of the form 
of Eq. 13 with an ML framework produces a projection 
with the same direction as Eq. 8. 

Adopting a simple linear model, Xj = W^^y^ + £4, the 
corresponding conditional data (observation) probabil- 
ity is a Gaussian, 

P{Kt\yt,W,al)^M{W-'yt,al). (15) 

Having defined the prior (from Eq. 10,11,12,13) and 
data probability (Eq. 15), we can now derive the like- 
lihood of our model as follows: 



T 

t=i 



Xl:N)dY 



Where * = {cr^, W, cr^ ^, Ai:Ar}. 



(16) 



In the following we will show that by substituting the 
above priors in Eq. 16 and maximising the likelihood 
we get a W which is the same direction (up to a scale 
ambiguity) to the deterministic PCA, LDA and LPPs 
and SFA. Firstly, by substituting the general prior 
(Eq. 14) in the likelihood, we obtain 



F(X|*) = 

mLiPi^t\yt,W,a^)Uxp 



A(i)yB(i)Y^ 



tr 



A(2)yb(2)y^' 



)}- 

(17) 



In order to obtain a ML solution, we map (Jx 



P(X|*) 



A(i)yb(i)Y^ 



Vt)iexp 



tr 



A(2)yb(2)y^' 



)}- 

(18) 
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By completing the integrals and taking the logs, we 
obtain the conditional log-likelihood: 

L(*) = \ogP{X\e) = -logZ + Tlog |W| - i 
tr [a(i)WXB(i)X^W^ + A(2)wXB(2)x^W^" 

(19) 

where log Z is a constant term independent of W. By 



maximising for W ( 



PL 
9W 



0) we obtain 



TW-^ - (a(i)WXB(i)X^ + A(2)wXB(2)x^j = 

I = a(i)wxb(i)x^w^ + a(2)wxb(2)x^w^. 

(20) 

It is easy to prove that since A^^-'jA*-^-' are diagonal 
matrices, the W which satisfies (20) is the one that 
simultaneously diagonalises (up to a scale ambiguity) 
XMiX^ and XM2X^. By substituting the B ma- 
trices as defined for Eq. 14 for each prior, we now 
consider all cases separately: 

• PCA: In case of P(Y) of Eq. 10 then Eq. 20 

is reformulated as W-^XX-^W = [A*-^-*] hence 
W is given by (up to a scale ambiguity) the eigen- 
vectors of total scatter matrix S. 

• LDA: In case of P(Y) of Eq. 11 then 
Eq. 20 is reformulated as A'^^W^XMX^W + 
A(2)w^XX^W = I. W is given by the direc- 
tions that simultaneously diagonalise S and S^u. 

• LPP: In case of P(Y) of Eq. 12 then Eq. 
20 is reformulated as A^^^W'^XLX^W + 
A'^'d^XX'^W = I. W is given by the direc- 
tions that simultaneously diagonalise XLX-^ and 
XDX^. 

• SFA: In case of P(Y) of Eq. 13 then Eq. 
20 is reformulated as A^^'W^XKX^W + 
A'-^-'XX-^W = 1. W is given by the directions 
that simultaneously diagonalise XKX"^ and 
XX^. 

The direction of W does not depend of tr^ and A„, 
which can be estimated by optimizing Eq. 19 with re- 
gards to these parameters. In this work we will provide 
update rules for cr„ and A„ using an EM framework. 
As we can see, the ML loading W does not depend 
on the exact setting of A„, so long as they are all dif- 
ferent. If < A„ < 1, V n, then larger values of A„ 
correspond to more expressive (in case of PCA) , more 
discriminant (in LDA), more local (in LPP) and slower 
latents (in case of SFA). This corresponds directly to 
the ordering of the solutions from PCA, LDA, LPP 
and SFA. To recover exact equivalence to LDA, LPP, 



SFA another limit is required that corrects the scales. 
There are several choices, but a natural one is to let 
cr^ = 1 — A^. This choice in case of LDA and SFA 
fixes the prior covariance of the latent variables to be 
one (W^XXW = I) and it forces W^XDXW = I in 
case of LPP. This choice of (t„ has been also discussed 
in (Turner & Sahani, 2007) for slow feature analysis. 

4. A Unified Expectation Maximization 
for Component Analysis 

In the following we propose a unified EM framework 
for component analysis. This framework can treat all 
priors with undirected links (such as Eq. 10, Eq. 11 
and Eq. 12). The EM of the prior in Eq. 13 is different 
since it contains only directed links with no loops, and 
thus can be solved (without any approximations) sim- 
ilarly to the EM of a linear dynamical system (Bishop, 
2006). 

In order to perform EM with an MRF prior we adopt 
the simple and elegant mean field approximation the- 
ory (Qian & Tittcrington, 1991; Celeux et al., 2003; 
Zhang, 1992). Let us consider the priors we defined in 
Sec. 3. Without loss of generality, we now assume the 
model 



(21) 



For clarity, the set of parameters associated with 
the prior (i.e. energy function) are denoted as /3 = 
{ci:iv, Ai:Ar}, the parameters related to the observa- 
tion model 9 = {W, (7x} while the total set of param- 
eters are denoted as = {9,/3}. 

In agreement with (Celeux et al., 2003), we replace the 
marginal distribution P(Y|/3) by the mean-field 



P(Y|/3)«nP(y, 



(22) 



i=l 



Since different models have different connectivity, the 
mean-field influence on each latent point y^ now de- 
pends on this specific connectivity through , 
which is a function of E[yj]. After calculating the nor- 
malising integral for the priors Eq. 10-12 and given 
the mean-field, we obtain 

AA(y,|mf^^\s(PCA))for(10) 
P(y,|m(^',/3) ={ AA(y,|mf°^\s(LDA))for(ll) 
AA(y,|mfPP\s(LPP))for(12) 

(23) 

where R = {PCA, LDA, LPP}. The means m^^ are 
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defined as 



(PCA) 


Am , 






II . 


(LPP) 


A( 


^^^^^ 


and the variances S^^-* 


as 




5](PCA) 






^(LDA) _ ^(LPP) 







(24) 



(25) 



where = jrziJ2j^i^[yj] is the mean, = 

-ir ^ pj. E[yj] the class mean, and fij^. = 

\Ci\ J I I 

|777| Sjga/' ^[yj] neighbourhood mean. Further- 



more, A [(5„„iA„], A(") 

1\ — U„in 1 A„ + (1-A„)S 



An 



V A„ + (l-A„) = 



and 



In order to complete the expectation step, we infer the 
first order moments of the latent posterior, defined as 

P(y.|x.mp),*) ^ P(x.|y.^)P(y.|mf),/3) 

4.P(x,|y„(?)P(y,|mP\/3)dy, 

(26) 

The mean and variance of y^ are then estimated as: 

= E[AA(y,|W-ix„ (f7-2W^W)-i) 
AA(y,|mf),SW) 



where s'' ^ = 



(27) 

E^^'[y.yf] = ( W^W+ +E[y,]E[y,]^. 

(28) 

Having recovered the first order moments, we move on 
to the maximisation step. By following the approx- 
imations in (Celeux et al., 2003), the complete-data 
likelihood is approximated as: 

T 

P(Y,X|*) ^l[Pix,\y,,e)P{y,\mf\l3). (29) 

1=1 

The likelihood is separated for estimating 6'^-^^ = 
{W(«),cri^^} and p = {a[^^,x['^^} as follows: 

eiR) ^ argmax { Y:ti 4 P(yz|x„ mf \ vf) 
logP(x,|yi,6')dyj|. 

(30) 



/?(«) = arg max { ^f^i 4, ^(y. |x., ' , *) 
logP(y,|mf\/3)dy,}. 

(31) 

Subsequently, we maximise the log-likelihoods wrt. 
the parameters, recovering the update equations (as 
detailed in the supplementary material). For 9, by 
maximising Eq. 30, we obtain: 

W(«) = (^g x,:E(«) [y.]^^ (f. ^^""^ [y^y^]) (32) 

-i^^'' - j^Ef=i{l|x.|p-2E(«)[yr(W(«))^x. 
+Tr[E(«)[y,yf](W(«))^W(«)]}. 

Similarly, by maximising Eq. 31 for /3, we obtain: 

-2EWb„,]mi^)+mi^)'^) 



(33) 



(34) 



where 



A„ + (1 - KY for i? = LDA, R = LPP 

(35) 

For A„ we choose the updates as described in Sec. 3. 

4.1. Comparison to other variants of PPCA 

It is clear that for PCA, the updates for 9 = {W, a^} 
as well as the distribution of the latent variable y^ 
are the same with previous probabilistic approaches 
(Roweis, 1998; Tipping & Bishop, 1999). The only 
variation is the mean of y^ , which in our case is shifted 

by the mean field ((s^^^^)) mf ^'^^). In addition, 
our method models per-dimension variance (cr„). 

4.2. Complexity 

The EM algorithm for our models is an iterative proce- 
dure for recovering the latent space which preserves the 
characteristics enforced by the selected latent neigh- 
bourhood. Our analysis is similar to PCCA (Roweis, 
1998; Tipping & Bishop, 1999). For N « T,F the 
complexity at each iteration is bounded by 0{TNF), 
unlike deterministic models which is O(T^). This is 
due to the covariance appearing only in trace opera- 
tions, and is of high value for our proposed EM based 
models, since e.g, for LPP there is only the determin- 
istic equivalent, with 0{T^) complexity. 

5. Experiments 

As a proof of concept, in this section, we demonstrate 
the application of our proposed probabilistic compo- 
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OBSERVATIONS DETERMINISTIC PROJ PROBABILISTIC E[Y] PROBABILISTIC PROJ. 




Figure 1. Synthetic experiments with deterministic LLE, LDA and PCA compared to our proposed probabihstic methods. 
For the deterministic models, the projections are shown in the 2nd column. For our probabilistic equivalents, we show 
the E[y] (3rd column) along with the projections (4th column). A neighbourhood of 12 was used in the case of LLE. 



nent analysis techniques on a set of synthetic data 
(see Fig. 1), generated utilising the Dimensionality 
Reduction Toolbox. In these examples, we compare 
to deterministic formulations of PCA, LDA and LLE. 
We use the same latent dimensions for the models in 
comparison. Note how the probabilistic projections 
match the deterministic equivalents. Also, in the E[y] 
of LDA we can clearly see the variance modelling in 
the latent clusters. 

6. Conclusions 

In this paper we introduced a novel, unifying proba- 
bilistic component analysis framework, which reduces 
the construction of probabilistic component analysis 
models to essentially selecting the proper latent neigh- 
bourhood via the design of the latent connectivity. 
Our framework can thus be used to introduce novel 
probabilistic component analysis techniques by formu- 
lating new latent priors as products of MRFs. In this 
work, we have shown specific priors which when used, 
generate probabilistic models corresponding to PCA, 
LPP, LDA and SFA. By means of theoretical analysis 
and experiments, we have demonstrated various 
advantages that our proposed methods pose against 
existing probabilistic & deterministic techniques, 
while to the best of our knowledge, we introduce the 
first probabilistic equivalent to LPP and discuss the 
first EM model for SFA. 
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